Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1022420140060030155
Phonetics and Speech Sciences
2014 Volume.6 No. 3 p.155 ~ p.164
Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging
Bang Jeong-Uk

Kwon Oh-Wook
Abstract
This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition(LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence issegmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained bysplitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms arecomputed from the morpheme frequency distribution, the morpheme length distribution, and the morphemefrequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pairwith the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and atrigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce theout-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.
KEYWORD
Pseudomorpheme, Korean LVCSR
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)